causal diagram
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Peru (0.04)
- South America > Colombia (0.04)
- North America > Mexico (0.04)
- (4 more...)
- Health & Medicine > Consumer Health (1.00)
- Education (0.93)
- Government (0.92)
- Health & Medicine > Therapeutic Area (0.68)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Banking & Finance (0.93)
- Information Technology (0.67)
- Health & Medicine > Health Care Providers & Services (0.67)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Greenland (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- North America > Canada (0.04)
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- (9 more...)
8fdd149fcaa7058caccc9c4ad5b0d89a-AuthorFeedback.pdf
Point#6clarifiesquestionsin"Correctness".3 1. Aregraphsnecessary? (Q1-2,Q4)The departing point of our work is the realization that an imitating policyis4 generally underdetermined by the observational data alone. For concreteness, consider modelsM1,M2, unknown5 to researchers, where inM1, X U, Y X; inM2, X U, Y X U; inMi,i = 1,2, P(U = 0) =6 P(U = 1) = 0.5. We assume thatY,U are unobserved;Y is the reward. Havingsaidthat,28 our methods could certainly be combined with GAIL to ensure both the causal robustness and the scalability with29 high-dimensional data, which we'llacknowledge inthepaper. R2: (1) A causal diagram containing latent rewardY generalize the traditional settings of imitation learning.
NestedCounterfactualIdentification fromArbitrarySurrogateExperiments
In this paper, we study the identification of nested counterfactuals from an arbitrary combination of observations and experiments. Specifically,building onamore explicit definition ofnested counterfactuals, we prove the counterfactual unnesting theorem (CUT), which allows one to map arbitrary nested counterfactuals to unnested ones.
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- (2 more...)
- Research Report > Experimental Study (0.94)
- Research Report > Strength High (0.68)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
On Transportability for Structural Causal Bandits
Intelligent agents equipped with causal knowledge can optimize their action spaces to avoid unnecessary exploration. The structural causal bandit framework provides a graphical characterization for identifying actions that are unable to maximize rewards by leveraging prior knowledge of the underlying causal structure. While such knowledge enables an agent to estimate the expected rewards of certain actions based on others in online interactions, there has been little guidance on how to transfer information inferred from arbitrary combinations of datasets collected under different conditions -- observational or experimental -- and from heterogeneous environments. In this paper, we investigate the structural causal bandit with transportability, where priors from the source environments are fused to enhance learning in the deployment setting. We demonstrate that it is possible to exploit invariances across environments to consistently improve learning. The resulting bandit algorithm achieves a sub-linear regret bound with an explicit dependence on informativeness of prior data, and it may outperform standard bandit approaches that rely solely on online learning.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Virginia (0.04)
- (2 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.45)